34 research outputs found
End-to-End Non-Autoregressive Neural Machine Translation with Connectionist Temporal Classification
Autoregressive decoding is the only part of sequence-to-sequence models that
prevents them from massive parallelization at inference time.
Non-autoregressive models enable the decoder to generate all output symbols
independently in parallel. We present a novel non-autoregressive architecture
based on connectionist temporal classification and evaluate it on the task of
neural machine translation. Unlike other non-autoregressive methods which
operate in several steps, our model can be trained end-to-end. We conduct
experiments on the WMT English-Romanian and English-German datasets. Our models
achieve a significant speedup over the autoregressive models, keeping the
translation quality comparable to other non-autoregressive models.Comment: EMNLP 201
Attention Strategies for Multi-Source Sequence-to-Sequence Learning
Modeling attention in neural multi-source sequence-to-sequence learning
remains a relatively unexplored area, despite its usefulness in tasks that
incorporate multiple source languages or modalities. We propose two novel
approaches to combine the outputs of attention mechanisms over each source
sequence, flat and hierarchical. We compare the proposed methods with existing
techniques and present results of systematic evaluation of those methods on the
WMT16 Multimodal Translation and Automatic Post-editing tasks. We show that the
proposed methods achieve competitive results on both tasks.Comment: 7 pages; Accepted to ACL 201
CUNI System for the WMT17 Multimodal Translation Task
In this paper, we describe our submissions to the WMT17 Multimodal
Translation Task. For Task 1 (multimodal translation), our best scoring system
is a purely textual neural translation of the source image caption to the
target language. The main feature of the system is the use of additional data
that was acquired by selecting similar sentences from parallel corpora and by
data synthesis with back-translation. For Task 2 (cross-lingual image
captioning), our best submitted system generates an English caption which is
then translated by the best system used in Task 1. We also present negative
results, which are based on ideas that we believe have potential of making
improvements, but did not prove to be useful in our particular setup.Comment: 8 pages; Camera-ready submission to WMT1
CUNI Submission to MRL 2023 Shared Task on Multi-lingual Multi-task Information Retrieval
We present the Charles University system for the MRL~2023 Shared Task on
Multi-lingual Multi-task Information Retrieval. The goal of the shared task was
to develop systems for named entity recognition and question answering in
several under-represented languages. Our solutions to both subtasks rely on the
translate-test approach. We first translate the unlabeled examples into English
using a multilingual machine translation model. Then, we run inference on the
translated data using a strong task-specific model. Finally, we project the
labeled data back into the original language. To keep the inferred tags on the
correct positions in the original language, we propose a method based on
scoring the candidate positions using a label-sensitive translation model. In
both settings, we experiment with finetuning the classification models on the
translated data. However, due to a domain mismatch between the development data
and the shared task validation and test sets, the finetuned models could not
outperform our baselines.Comment: 8 pages, 2 figures; System description paper at the MRL 2023 workshop
at EMNLP 202
CUNI System for the WMT18 Multimodal Translation Task
We present our submission to the WMT18 Multimodal Translation Task. The main
feature of our submission is applying a self-attentive network instead of a
recurrent neural network. We evaluate two methods of incorporating the visual
features in the model: first, we include the image representation as another
input to the network; second, we train the model to predict the visual features
and use it as an auxiliary objective. For our submission, we acquired both
textual and multimodal additional data. Both of the proposed methods yield
significant improvements over recurrent networks and self-attentive textual
baselines.Comment: Published at WMT1
Input Combination Strategies for Multi-Source Transformer Decoder
In multi-source sequence-to-sequence tasks, the attention mechanism can be
modeled in several ways. This topic has been thoroughly studied on recurrent
architectures. In this paper, we extend the previous work to the
encoder-decoder attention in the Transformer architecture. We propose four
different input combination strategies for the encoder-decoder attention:
serial, parallel, flat, and hierarchical. We evaluate our methods on tasks of
multimodal translation and translation with multiple source languages. The
experiments show that the models are able to use multiple sources and improve
over single source baselines.Comment: Published at WMT1
Neautoregresivní neuronový strojový překlad
In recent years, a number of mehtods for improving the decoding speed of neural machine translation systems have emerged. One of the approaches that pro- poses fundamental changes to the model architecture are non-autoregressive models. In standard autoregressive models, the output token distributions are conditioned on the previously decoded outputs. The conditional dependence al- lows the model to keep track of the state of the decoding process, which improves the fluency of the output. On the other hand, it requires the neural network computation to be run sequentially, and thus it cannot be parallelized. Non- autoregressive models impose conditional independence on the output distri- butions, which means that the decoding process is parallelizable and hence the decoding speed improves. A major drawback of this approach is lower trans- lation quality compared to the autoregressive models. The goal of the non- autoregressive translation research is to find methods that improve the trans- lation quality, while retaining high decoding speed. In this thesis, we explore the research progress so far and identify flaws in the generally accepted eval- uation methodology. We experiement with non-autoregressive models trained with connectionist temporal classification. We find that even though our models...V poslední době nabídl výzkum strojového překladu nové metody pro zrych- lení generování. Jedním z navrhovaných metod je takzvaný neautoregresivní neuronový strojový překlad. V klasických autoregresivních překladových sys- témech jsou výstupní pravděpodobnostní rozdělení modelována podmíněně na předchozích výstupech. Tato závislost umožňuje modelům sledovat stav překlá- dání a obvykle vede ke generování velmi plynulých textů. Autoregresivní postup je však ze své podstaty sekvenční a nelze jej paralelizovat. Neautoregresivní sys- témy modelují pravděpodobnosti jednotlivých cílových slov jako navzájem pod- míněně nezávislé, což znamená, že dekódování lze paralelizovat snadno. Nevýho- dou je ovšem nízká kvalita překladu ve srovnání s modely autoregresivními. Cíl výzkumu neautoregresivních metod strojového překladu je zlepšit kvalitu pře- kladu a zároveň uchovat vysokou rychlost dekódování. Naše práce předkládá re- šerši publikovaných metod a poukazuje na některé nedostatky plynoucí z obecně přijímané evaluační metodologie. Popisujeme experimenty s neautoregresivními modely trénovaných pomocí takzvané " connectionist temporal classification". Z našich výsledků plyne, že i když dosahujeme nejlepších výsledků mezi neautore- gresivními modely na datech z WMT z roku 2014, při porovnání s nejnovějšími...Institute of Formal and Applied LinguisticsÚstav formální a aplikované lingvistikyMatematicko-fyzikální fakultaFaculty of Mathematics and Physic